Feature Selection for Improved Phone Duration Modeling of Greek Emotional Speech
نویسندگان
چکیده
In the present work we address the problem of phone duration modeling for the needs of emotional speech synthesis. Specifically, relying on ten well known machine learning techniques, we investigate the practical usefulness of two feature selection techniques, namely the Relief and the Correlation-based Feature Selection (CFS) algorithms, for improving the accuracy of phone duration modeling. The feature selection is performed over a large set of phonetic, morphologic and syntactic features. In the experiments, we employed phone duration models, based on decision trees, linear regression, lazy-learning algorithms and meta-learning algorithms, trained on a Modern Greek speech database of emotional speech, which consists of five categories of emotional speech: anger, fear, joy, neutral, sadness. The experimental results demonstrated that feature selection significantly improves the accuracy of phone duration modeling regardless of the type of machine learning algorithm used for phone duration modeling.
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملImproving of Feature Selection in Speech Emotion Recognition Based-on Hybrid Evolutionary Algorithms
One of the important issues in speech emotion recognizing is selecting of appropriate feature sets in order to improve the detection rate and classification accuracy. In last studies researchers tried to select the appropriate features for classification by using the selecting and reducing the space of features methods, such as the Fisher and PCA. In this research, a hybrid evolutionary algorit...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملImproved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition
Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...
متن کاملTwo-stage phone duration modelling with feature construction and feature vector extension for the needs of speech synthesis
We propose a two-stage phone duration modelling scheme, which can be applied for the improvement of prosody modelling in speech synthesis systems. This scheme builds on a number of independent feature constructors (FCs) employed in the first stage, and a phone duration model (PDM) which operates on an extended feature vector in the second stage. The feature vector, which acts as input to the fi...
متن کامل